Efficient Massively Parallel Transport Sweeps
نویسندگان
چکیده
The full-domain “sweep,” in which all angular fluxes in a problem are calculated given previous-iterate values only for the volumetric source, forms the foundation for many iterative methods that have desirable properties [1]. One important property is that iteration counts do not grow with mesh refinement [1]. The sweep solution on parallel machines is complicated by the dependency of a given cell on its upstream neighbors. A simple task dependence graph (TDG) for a single quadrature direction in a 2D example (Fig.1) illustrates the issue: tasks at a given level of the graph cannot be executed until some tasks finish on the previous level. The KBA algorithm [2] partitions the problem by assigning a column of cells to each processor, indicated by the four diagonal task groupings in Fig.1. KBA parallelizes over planes perpendicular to the sweep direction—over the breadth of the TDG. Early and late in a single-direction sweep, some processors are idle, as in stages 1-3 and 9-11 in Fig.1. In this example, parallel efficiency for an isolated singledirection sweep could be no better than 8/11 = 0.73. KBA is much better, because when a processor finishes its tasks for the first direction it begins its tasks for the next direction in the octant-pair with the same sweep ordering. That is, each processor begins a new TDG as soon as it completes its work on the previous TDG, until all directions in the octant-pair finish. This effectively lengthens the “pipe” and increases efficiency. If there were 2M directions in the octant pair, then the pipe length is 2M × 8 in this example, and the efficiency could be up to (2M × 8)/(3 + 2M × 8). KBA’s pipe-fill penalty grows as processor count grows, even if cell count grows proportionally. The width of the TDG grows only as P , so traditional KBA eventually runs out of parallelism to exploit. These issues fuel the common belief that sweeps cannot perform well in parallel beyond a few thousand processing elements. One purpose of this summary is to help dispel this belief. A sweep algorithm is defined by its partitioning (dividing the domain among processors), aggregation (grouping cells, directions, and energy groups into “tasks”), and scheduling (choosing which task to execute if more than one is available). The work presented here follows that
منابع مشابه
Implementation of a Cell-wise Block-gauss-seidel Iterative Method for Sn Transport on a Hybrid Parallel Computer Architecture
We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the Sn transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine (Cell/B.E.). LAPACK routines have been ported to the Cell/B.E. in order to make use of its p...
متن کاملProvably Optimal Parallel Transport Sweeps on Regular Grids
We have found a set of provably optimal algorithms for executing full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a “proof” that they will always execute the full eight-octant sweep in the minimum possible number of stages for a given Px × Py × Pz partitioning. A stage includes each processor choosing a task to exe...
متن کاملDetecting Selective Sweeps from Pooled Next-Generation Sequencing Samples
Due to its cost effectiveness, next-generation sequencing of pools of individuals (Pool-Seq) is becoming a popular strategy for characterizing variation in population samples. Because Pool-Seq provides genome-wide SNP frequency data, it is possible to use them for demographic inference and/or the identification of selective sweeps. Here, we introduce a statistical method that is designed to det...
متن کاملParallel Sn Sweeps on Unstructured Grids: Algorithms for Prioritization, Grid Partitioning, and Cycle Detection
The method of discrete ordinates is commonly used to solve the Boltzmann transport equation. The solution in each ordinate direction is most efficiently computed by sweeping the radiation flux across the computational grid. For unstructured grids this poses many challenges, particularly when implemented on distributed-memory parallel machines where the grid geometry is spread across processors....
متن کاملA massively parallel semi-Lagrangian algorithm for solving the transport equation
The scalar transport equation underpins many models employed in science, engineering, technology and business. Application areas include, but are not restricted to, pollution transport, weather forecasting, video analysis and encoding (the optical flow equation), options and stock pricing (the Black-Scholes equation) and spatially explicit ecological models. Unfortunately finding numerical solu...
متن کامل